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Abstract 

Finite detector resolution and limited acceptance require to apply unfolding 
CN 1 methods in high energy physics experiments. Information on the detector reso- 

lution is usually given by a set of Monte Carlo events. Based on the experience 
\ with a widely used unfolding program (RUN) a modified method has been 

kjr)' developed. 

The first step of the method is a maximum likelihood fit of the Monte Carlo 
distributions to the measured distribution in one, two or three dimensions; the 
^•O . finite statistic of the Monte Carlo events is taken into account by the use of 

Barlows method with a new method of solution. A clustering method is used 
t-h ' before to combine bins in sparsely populated areas. In the second step a regu- 

larization is applied to the solution, which introduces only a small bias. The 
^ \ regularization parameter is determined from the data after a diagonalization 

O ' and rotation procedure. 

OO : 

O ■ 

(N ' 1. THE UNFOLDING PROBLEM 

o . 

obtain f(x) by a simple histogram of the quantity x. With real detectors the determination of f(x) is 
' complicated by three effects: 

■ 

• Limited acceptance: The probability to observe a given event, the detector acceptance, is less 
\ than 1. The acceptance depends on the kinematical variable x. 

• Transformation: Instead of the quantity x a different, but related quantity y is measured. The 
transformation from x to y can be caused by the non-linear response of a detector component. 

• Finite resolution: The measured quantity y is smeared out due to the finite resolution (or limited 
measurement accuracy) of the detector. Thus there is only a statistical relation between the true 
kinematical variable x and the measured quantity y. 

The really difficult effect in the data correction for experimental effects, or data transformation 
from y to x is the finite resolution, causing a smearing of the measured quantities. Mathematically the 
relation between the distribution f(x) of the true variable x, to be determined in an experiment, and the 
measured distribution g(y) of the measured quantity y is given by the integral equation, 



A standard task in high energy physics experiments is the measurement of a distribution f(x) of some 
qjj ■ kinematical quantity x. With an ideal detector one could measure the quantity x in every event and could 



(1) 



called a Fredholm integral equation of the first kind. In practice often a known (measured or simulated) 
background contribution b(y) has to be added to the right-hand side of equation ([I]); this contribution is 
ignored in this paper. The resolution function A(y, x) represents the effect of the detector. For a given 
value x = xq the function A(y, xq) describes the response of the detector in the variable y for that fixed 
value xo- The problem to determine the distribution f(x) from measured distributions g(y) is called 
unfolding; it is called an inverse problem. Unfolding of course requires the knowledge of the resolution 
function A(y, x), i.e. all the effects of limited acceptance, transformation and finite resolution. 



In addition to the imperfections of the detector, there may be further effects between x and y, 
which are outside of the experimental control, even with an ideal detector. One example are radiative 
effects, which in experiments are often corrected afterwards (radiative corrections), but are in their effect 
similar to detector effects. If the true kinematical quantity is defined at the parton level, further effects 
from the fragmentation process of partons to the (observable) hadrons influence the measured quantity 
y. All these effects are of statistical nature. 




Fig. 1: The Monte Carlo simulation of the effects of limited 
acceptance, transformation and finite resolution. Shown 
is the original (true) distribution (line histogram) and the 
"measured" distribution (yellow/shaded histogram). 



A typical example for these effects is shown in Figure [| taken from a Monte Carlo simulation 
of all three effects. By unfolding an estimate of the original distribution has to be determined from the 
distorted measured distribution. Details on the Monte Carlo simulation are given later in section [y, 
where the unfolding in this example is discussed in detail. 

For the numerical solution of equation ([]]) the distributions have to represented by a finite set of 
parameters. One posssibility is to represent the distributions by histograms, and the resolution function 
by a matrix. Equation ([!]) can then be represented by the matrix equation 



y = Ax , (2) 



which has to be solved for the vector x, given the vector y (data histogram). The vector y with n 
elements represents a histogram of the measured quantity y, and the distribution f(x) is represented by 
a histogram of the vector x with m elements. The variables y and x may be multidimensional, and the 
multidimensional histograms can be mapped to ra-bin (x) and m-bin histograms (y), respectively. The 
transition from x to y is described by the n-by-m matrix A. The element a%j is related to the probability 
to observe an entry in histogram bin i of the histogram y, if the true value x is from histogram bin j of 
the histogram x. Problems with standard solutions are discussed in the next section. 

In high energy physics experiments the problems is even more difficult than in other fields. Often 
the statistics of the measurement is not high and every y-bin content (which is distributed due to the 
Poisson distribution around the expected value) has a large statistical fluctuation. Furthermore the reso- 
lution function A(x, y) (or the matrix A) is not known analytically, but is represented by a data set from 
Monte Carlo simulation of the process, based on some assumed distribution fuc{x), 

9uc{y) = J A(y, x)/mc(^) dx , (Monte Carlo simulation) (3) 

and is also statistically limited. Standard methods for the solution of integral equations or linear equations 
can not be used in this case. 

A simple method like the so-called bin-by-bin correction may be meaningful if the measurements 
y are very close to the true values x. Real unfolding methods, taking all the correlations into account, 
are essential if there are larger effects of transformation and finite resolution. A solution x has to be 
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found, with small deviations between the elements of Ax and the elements of the actually measured 
histogram y. In the maximum likelihood method a function F(x) is constructed as the negative log of 
the Likelihood function, which describes the statistical relations between data and result: 

F(x) = - log L(x,y, A) (4) 

and the minimum of F(x) is determined. Wildly fluctuating results x are due to large (negative) corre- 
lations between adjacent bins and are not acceptable. The approach to get a more reasonable solution 
is to impose a measure of the smoothness on the result x; this method is called regularization. This 
technique was proposed independently by Phillips [§] and by Thikhonov [J3j] - For a function f(x) the 
integrated square of the second derivative 




is often used in the regularization which in the linearized version of the problem can be expressed by a 
quadratic form C(x) = x T Cx with a positive-semidefinite matrix C (derivatives are replaced by finite 
differences). Equation (|j) is then modified to the form 

F(x) = - log L(x,y, A) +t-C(x) (6) 

with a factor r called regularization parameter. 

The result of the minimization of the modified function F(x) of equation (||) will show smaller 
fluctuations than the result obtained from equation (Q) and may be more useful to compare the measure- 
ment with theoretical predictions. However it is clear that unavoidably the regularization introduces a 
bias. The magnitude of the bias depends on the value of regularization parameter r. A very large value 
would result in a linear function f(x) or distribution x, respectively. It is clear that the method requires 
an a-priori knowledge about a smooth behaviour of f{x). The function fmc{x) used in the Monte Carlo 
simulation of equation ([|) is often very close to the final result f(x), i.e. the ratio is rather smooth. This 
suggests to express f(x) in the form /(x) = /mc(^) x /mult {%) and to rewrite equation ([I]) in the form 

g(y) = J i A (y, x )fMc(x)] f ma i t (x)dx . (7) 

For the discretized form the function /mc(^) can be absorbed in a redefinition of matrix A and the 
vector x is interpreted as discretization of the hopefully smooth function f mu \t(x). With this redefinition 
the equation (||) can remain unchanged. The program RUN [|], ||] for regularized unfolding is available 
since almost two decades and has been used in many experiments; early applications are H and [0]. It 
is based on the reinterpretation of matrix A and x, as described above, and includes a method for the 
determination of the regularization parameter r based on the available degrees of freedom. In the method 
described later in this paper some details are treated differently. 



2. UNFOLDING AS AN ILL-POSED PROBLEM 



The problems inherent to unfolding are discussed in a simple special case, assuming a resolution matrix 
A with some smearing of data into neighbour bins. Assuming a true vector x the product y = Ax 
describes the distribution expected due to the migration effect. With the same dimensions for the vectors 
x and y the matrix A is a square matrix and in the example later in this section the following simple 
symmetric form is assumed for the matrix A, which depends on a single parameter e (e = migration 
parameter); for a 5-by-5 matrix the form is 
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A direct solution for x, given a measurement y differing from the expectation Ax with the true vector 
x by statistical fluctuations, is possible with inversion of the matrix A: 

estimate x = A l y error propagation V(x) = A~ l V y (A _1 ) T . 

The result has certain good statistical properties, for example it has no bias: E [x] = A~ 1 E[y] = 
A~ 1 AE [x] = x. In practice the result is however satisfactory only for a matrix A with dominating 
diagonal; the result looks terrible if the matrix A describes a large migration to neighbour bins. Conse- 
quently the problem is called an ill-posed problem. In the following the solution of the equation y = Ax 
using an orthogonal decomposition is discussed; this will allow some insight into the unfolding problem. 

The symmetric matrix A is expressed by 

A = UDU T (9) 

with a transformation matrix U with property U T U = 1, and a diagonal matrix D, where the diagonal 
elements of matrix D are the eigenvalues Xj of matrix A (in the order of decreasing value). The trans- 
formation matrix U contains the corresponding eigenvectors with the eigenvector Uj in the j-th column. 
The condition number k of a matrix is defined by the ratio of eigenvectors k = A max /A m i n ; the value of 
k is important for the quality of unfolding (see below). For values above e = 0.20 the condition number 
k is very rapidly increasing. 

A transformation of equation y = Ax to a new basis is done by multiplication with matrix U T 
(which is a rotation in the n-dimensional space): 

U T • | y = Ax = UDU T x . 

U T y = DU T x -> c = Db. 

=b 

The matrix U of eigenvectors uj allows to transform the vectors x and y to vectors b = U T x and 
c = U y, and to transform these vectors back by x = Ub and y = Uc. The transformed equation 
c = Db with the diagonal matrix D shows, that each of the coefficients bj and Cj is transformed 
independently of any other coefficient by the simple relation Cj = Xj ■ bj . This operation does not depend 
on any assumption of the solution x, and depends only on the properties of the matrix A. Folding 
(x — > y) and unfolding (y — ► x) is multiplication and division by the eigenvalues Xj, respectively, of 
the coefficients in the transformed space. 

In order to unfold a measured vector y, the vector is transformed by c = U T y to coefficients 
Cj, which have values influenced by statistical fluctuations of the elements of vector y. In the unfolding 
the coefficients Cj are divided by the eigenvalues Xj to obtain bj = Cj/Xj; the statistical fluctuation of 
coefficient Cj is magnified for small eigenvalues Xj (i.e. Xj <C 1). Eventually, for very small eigenvalues 
Xj, the final result x = Ub will be dominated by one or by few of the coefficients bj with small 
eigenvalues and large statistical errors, and the complete result is unsatisfactory. 

Example. In a numerical example the matrix A has the form of equation <^> with n = 20 and a value of 
the migration parameter of e = 0.22. The first eigenvalue is Ai = 1.0, and the last one is A20 = 1/7.9, 
giving a condition number k = 7.9. For x the ideal distribution of Figure ||a is assumed; the underlying 
function is of the form xexp(— ax). The decomposition of the matrix A according to equation @ is 
performed and the coefficients bj and Cj are calculated. These coefficients are shown in Figure |3]a (with 
bj > Cj). In addition this figure shows, calculated by standard error propagation, the almost constant 
error level of the coefficients, of the folded distribution of Figure |^a with Poisson distributed bin contents. 
Figure |3|a shows, that the coefficients bj of the true distribution decrease rapidly with increasing value j 
of the index of the coefficient, by roughly three orders of magnitude. The coefficients Cj of the folded 
distribution drop even faster, because it is more smooth due to the migration effect. Of course the relation 
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Fig. 2: Original (true) distribution (a) and two results from unfolding ((b) and (c)). Result (b) has been obtained from all 20 
coefficients, and for result (c) a sharp cut-off after 10 coefficients has been applied (i.e. the coefficients 1 1 to 20 are ignored). 




Fig. 3: The absolute values of coefficients bj and Cj are shown for j — 1,2,... 20. The coeffients bj and Cj for the true 
distribution and the folded distribution (without measurement errors) are shown in (a), together with the (almost constant) error 
estimate for the coeffients bj calulated by error propagation. The coeffients Cj from the simulated measured distribution are 
shown in (b), together with the error estimate. For j above 12 the smaller coefficients of the folded distribution become smaller 
than the statistical error. In (b) the coefficients for j above 12 are dominated by statistical errors and even the sign is not 
determined by the data. 



bj/cj = Xj is valid. The last coefficient bj in Figure |3|a is reduced to Cj by the inverse of the condition 
number of the matrix, which is k = 7.9 in this case. 

The components of the first eigenvector u\ (eigenvalue = 1) are all the same. Thus the coefficients 
bi and Cj are identical, and proportional to the total sum of the measured distribution, not at all influenced 
by the migration. If visualized by functions, interpolating the components the eigenvector Uj (eigenvalue 
Xj) has j — 1 zeros, and the curvature of the visualized eigenvectors is rapidly increasing with index 
j. The components of the last eigenvector u n have alternating sign for the bins; it has a small absolute 
value and its measured value will have a large relative statistical error. The value of 620 is obtained by 
620 = 7.9 • C20 in unfolding, introducing a large bin-to-bin oscillation into the result of unfolding. 

In a simulation Poisson distributed bin contents are assumed in the measurement vector y. The 
coefficents for this measured distribution are shown in Figure ||b, together with the level of the statistical 
error. As expected from the size of the errors all coefficients with an index above about j = 12 are 
dominated by the statistical error and therefore do not significantly contribute to the information content 
of the measurement. For indices above j = 12 even the sign of the coeffient can not be determined by 
the measurement. 

Using all the "measured" coefficients for the unfolding the result of Figure ^b is obtained. This 
result shows large fluctuations around the expected values shown by the curve. The fluctuations are due 
to the contributions from indices above j = 12, which represent noise and are magnified in the unfolding 
because of the large values of their inverse eigenvalues. The result is clearly unsatisfactory. 

Because all measured coefficients Cj with j above a value of 12 are dominated by statistical errors 
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(noise) their use in the unfolding makes no sense. A sharp cut-off after index j = 12 or even after index 
j = 10 will not remove any useful information from the measurement. The unfolding result using only 
measured coefficients Cj up to j = 10 is shown in Figure ||c; compared to Figure ||b the large fluctuations 
are suppressed and the results seems to be acceptable. Of course the fine structure of the true distribution 
expressed by the true coefficients bj with j > 10 is not included in the solution and this may represent a 
bias. It is an unavoidable bias because these coefficients can not be measured. 

The covariance matrix of the result can be calculated by standard error propagation. However it 
is clear that the covariance matrix is singular and has only rank 10 in this case, because the 20 bins are 
obtained from 10 measured coefficients (10 degrees of freedom). This property is inherent to the cut-off 
method and to the regularization method, and was already mentioned in [Q]. Such singularity of the 
covariance matrix can be avoided if the final transformation is to a number of bins identical to the degree 
of freedoms; only a limited number of bins can be obtained in a measurement with large miration effects. 

This method of using a sharp cut-off has to be compared to the regularization method. It has been 
shown [Q] that the use of a regularization function of the type of equation (|J) is equivalent to a smooth 
cut-off; essentially the measured coefficients Cj are multiplied by a factor depending on the curvature of 
the orthogonal contributions (see section |3]).[] 

3. THE PROPOSED UNFOLDING METHOD 

The proposed method is similar to the method used in RUN; the differences are emphasized in this 
section. It is expected that the proposed modifications results in more stable solutions. The proposed 
method requires large dimension parameters in the resolution matrix A. Like in RUN the regulariza- 
tion is determined by the required number of degrees of freedom, which determines the regularization 
parameter. 

Figures in this section refer to the example already mentioned in section [I] In a Monte Carlo 
calculation of all three effects, limited (x-dependent) acceptance, non-linear transformation and finite 
resolution are simulated. Details on the function and the distorting effects are identical to the published 
examples In total 100 000 "events" are simulated for "data" and for the MC defining matrix A. The 
input function /mc(^) (equation (^|)) is a constant. 

In RUN the discretization for f(x) and for A(y, x) was done using cubic B-spline functions; the 
effect is the same as for simple histograms namely the integral equation is transformed to a system of 
linear equations, however the elements of the vectors are B-spline coefficients instead of bin contents. 
The advantage is that the parametrized solution is a smooth function and the curvature as defined by 
equation (||) can be exactly written as a quadratic form. However the accurate determination of matrix A 
requires a good Monte Carlo statistic. In RUN statistical fluctuations of the elements of matrix A could 
not be treated. 

Simple histograms are instead proposed here; the elements of the vector y are bin contents (integer 
numbers). The curvature of the solution is constructed by finite differences: the second derivative in bin 
j is proportional to Xj—\ — 2xj + Xj+\. In a histogram some resolution is lost if bins with a width as large 
as expected for the final resolution would be used. It is recommended to use initially m = 2naf bins for x 
for a final number of degrees of freedom of n^. For y a larger number of bins n (> m) is recommended, 

1 Sometimes the iterative solution of unfolding problems expressed by the equation y = Ax is proposed in the literature 
without explicit regularization, starting from a "good" initial distribution for x. Of course equations of this type (with a square 
matrix) have a unique solution and iterative solutions are slow compared to the direct solution; after a large number of iterations 
with convergence the same unsatisfactory result as by direct solution will be obtained. However in these proposals only a small 
number of iterations is recommended. It can be shown that iterative methods can in fact include an implicit regularization 
|^]: there is a different speed of convergence for the various orthogonal contributions and the small contributions with a small 
eigenvalue will converge very slowly. Thus after a few iterations the (large) coefficients with large eigenvalues are already 
accurate; the remaining coefficients are still almost unchanged and thus, for a stop after few iterations, their values are still 
close to the starting values. There is of course some subjectivity in stopping "early" enough. 
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to avoid a loss of resolution. Thus the number nxmof elements is large, and a large sample of Monte 
Carlo events is required to fill matrix A. The statistical error of the elements eventually can not be 
neglected. 

Standard Poisson maximum likelihood fit. Ignoring initially eventual statistical errors of the elements 
the expected number of events in bin i of y is given by = X^j=i a «i x j ■ For the expected 
number yi, as given by this expression, the observed values % follows the Poisson distribution. Optimal 
estimates for the elements Xj are obtained by minimizing the (negative) logarithm of the total likelihood 
with respect to the elements Xj of vector x, assuming the Poisson distribution: 



Fix) = -hxC(x) = - In 



f[PyM 



i=l 



^2 (Vi Vi) + const - , (!0) 



i=l 



where the constant term containing e.g % ! can be ommited. This expression ( flO| ) correctly accounts also 
for bins with a small number of histogram entries 

An alternative would be to use the (linear) least squares method with singular value decomposi- 
tion for the fit. However for small number of entries the use of the Poisson distribution seems to be 
essential. Furthermore the diagonalization used later in the method is almost equivalent to singular value 
decomposition (eigenvalues are the squares of the singular values). 

Fitting with finite Monte Carlo samples. The problem of statistical fluctuations of the elements ay- 
has been neglected so far. A method to treat the problem within the maximum-likelihood method has 
been developed by R.Barlow and Chr.Beeston [g]. In this method there is for each source bin Xj some 
(unknown) expected number of events Aij. For each element Aij the corresponding number a^ from the 
Monte Carlo sample is generated by a distribution which is taken to be Poisson too. The nice feature of 
this method is that a bias which would be introduced by ignoring the statistical character of the values of 
the elements a^ is avoided and the maximum likelihood error is more realistic. A large number of slack 
variables (one for each bin) is introduced and has to be treated in the optimzation. A new fast numerical 
solution method has been developed (see [jl|]). 

Combining bins. The likelihood function is a sum over all bins. Combining almost empty bins does not 
introduce a systematic error. The total number of elements of the matrix may be large, especially if x 
and/or y are multidimensional, and a small number of entries (or even zero) in an element may not be 
uncommon. The combination of almost empty bins is done with a cluster algorithm, taking into account 
the distance between bins in one, two or three dimensions. 

First option: sharp cut-off of orthogonal contributions. This method is rather similar to the method 
discussed in section [T| The computational problem is to determine the minimum of F(x) (see equation 
(|T0|)). The standard iterative method is based on the representation for the correction Ax 

F{Ax) = ^Ax T HAx + Ax T g + ... (11) 

with the Hessian H (matrix of second derivatives of F(Ax)) and the gradient vector g (first derivatives 
of F(Ax)). A Newton step is then calulated from equation HAx + g = 0. Convergence is usually fast 
for good starting values and the covariance matrix is equal to the inverse of the Hessian. The starting 
values can be calculated by a linear least square fit, based on the approximation of the Poisson distribution 
by a Gaussian distribution for each bin. 

A sharp cut-off as discussed in the example of section ^] requires a diagonalization of the sym- 
metric matrix H by H = UD U T with a diagonal matrix D and a transformation matrix U. By 
a transformation (rotation) in ai-space linear combinations of the ^-components are obtained with a 
diagonal covariance matrix, with variances of the linear combinations given by the inverse of the eigen- 
values of matrix D. A cut-off is done at some index j followed by backtransformation to the a;-space of 
bin-contents using the transformation matrix U. 
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Second option: regularization. In this option the regularization is based on the second derivative of 
the result according to equation (||), which can be expressed by a quadratic form x T Cx with a positive- 
semidefinite matrix C. In principle the same procedure is used as in RUN; the mathematical details are 
given elsewhere [ffl]. Here a simple explanation is given on the standard mathematical operations^ used. 
Regularization is done by adding the term r • x T Cx to the function F(Ax) of equation (|l~ij). Exactly as 
in the first option the Hessian is diagonalized. 



H = UD U T 



H 



UD^U 1 



UD 



-1/2 



(12) 



Up to this step everything is identical to the cut-off option. Using transformation matrix UD^ 1 ^ 2 the 
vector x is transformed to linear combinations x, which are orthogonal, with all variances equal to 1 
(unit covariance matrix). Because the covariance matrix is equal to the unit matrix, every additional pure 
rotation will not change the (unit) covariance matrix. In terms of the transformed vector the regularization 
term can now be written in the form r • x Cjjx, where C\j is the transformed curvature matrix C. Now 
another diagonalization can be done of matrix Cjj: 



T ■ X T Cx 



^ r n 

T • X CjjX 



x T U c S U T c x 



(13) 



with a diagonal matrix S and a rotation matrix U "q. This diagonalization can be used to define a pure 
rotation from the linear combination x to another linear combination x 



x 



x 



U T c x 



(14) 



The components of the new vector x still have the unit matrix as covariance matrix. The complete 
transformation from x to x is the effect of the transformation by UD^ 1 ^ 2 and by U c- The algebra can 
be explained in other words: the error ellipsoid related to the Hessian is first rotated to have the axes 
parallel to the axes of the new system. By a change of the scales the ellipsoid is transformed to a sphere, 
which will remain a sphere for any further rotation. A last rotation is done to bring the axes into the order 
of increasing curvature. 




Fig. 4: Selected column vectors of the complete trans- 
formation matrix defined in the regularization procedure. 
They correspond to the curvature eigenvalues S33, S44 and 
516,16 • Visualization is done by curves interpolating the 
components. The amplitude associated which each vector 
all have the same standard deviation of 1 . 



Some columns of the complete (product) transformation are shown in Figure ^. All linear com- 
binations obtained have the same precision (standard deviation of the coefficient is one). As seen in the 
Figure linear combinations with large index j are oscillating with large amplitude. The diagonal elements 
Sjj are the (statistically independent) contributions of the elements of x to the total curvature. Sorted 
according to increasing value of Sjj the value of Sjj will increase rather fast with increasing index j. The 
spectrum of eigenvalues Sjj is shown in Figure |5| In terms of the linear combinations x regularization 
is simply given by 



(xA es =[ 1 + T . s .. )(xj)^ ■ d5) 
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Fig. 5: The eigenvalues after the curvature transformation. The values are very rapidly increasing for orthogonal contributions 
for increasing index value (left). The amplitudes before (left bars) and after regularization (right bars). The statistical error of 
all amplitudes is equal to 1, which is indicated by the horizontal line. The vertical scale is linear at the bottom and makes a 
transition to a logarithmic scale at the top (right). 



and this simple form is the reason for the transformations made before. 

Determination of the regularization parameters r. The first factors (small j) on the right-hand-side 
of equation (|l^) will be close to 1 ; for a value r = 1/ Skk the factor will be 1/2 and for indices j > k 
will rapidly decrease towards zero. The sum of all factors can be called the effective number of degrees 
of freedom, and can be used to determine the value of the regularization parameter r from the required 
number of degrees of freedom, i.e. the regularization parameter r is determined from the value of in 
the equation 




Thus the required number of degrees of freedom has to be specified and determines the degree of regu- 
larization. This number can be taken from the spectrum of the coefficients or amplitudes, shown in Figure 
||. The insignificant part (large j) is clearly visible in the spectrum and separated from the significant 
part (small j). The selected value of should be equal to or larger than the number of significant 
terms. The unregularized amplitudes, which have standard deviation one, are shown by the left bars; 
amplitudes above index 15 are compatible with one and represent noise. They would however make a 
large contribution to the solution, because the corresponding column vectors (Figure |j) are large. The 
regularization effectively damps the amplitude (right bars) around and above index 15, which has been 
chosen as the degree of freedom here. The significant amplitudes are not affected by the regularization. 

The final result of the example (measured distribution in Figure [j]) is shown in Figure ||. The left 
figure shows 30 data points with error bars together with the original (true) distribution; within errors 
the original distribution is nicely reproduced. The rank of the covariance matrix is about 15, which was 
chosen as the effective number of degrees of freedom; thus inversion of the covariance matrix, needed 
e.g. for a least-square fit of a model to the data, is not possible. Although the large number of 30 data 
points seems to be attractive, the data points should be reduced to 15 data points by combining two 
bins to one, which then have a full-rank covariance matrix. This set of data points is shown in Figure ^ 
(right). The broader bins of this set of data points are a consequence of the limited acceptance and finite 
resolution of the measurement. 

2 In a publication the method has been described to "have certain mathematical complications", but it is based only on 
standard linear algebra of symmetric matrices. 
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